S04 T01: Visualització gràfica d'un dataset

Descripció:

Complementa les tècniques d'exploració de les dades mitjantçant la visualització gràfica, amb les llibreries Matplotlib i Searborn.

Nivell 1

Crea almenys una visualització per:

Distribution of total delay

The plot shows that majority of the delays are within 25 minutes,and the number decreases significantly as the duration increases.

This shows that major delays are few and if we reduce the minor delays air transport will be optimised and those delays with high values are actually outliers.

Distribution of unique carrier with total delay

From the barplot it clear that YV plane carrier has major contribution the delays followed by OO and UA.

Barplot for distribution of unique carrier with total on monthly basis

The graph shows that most of the delays in the month of October and November are of flights belonging to PS. Only in the month of December CO carrier has the majority of delays.The result seems quite obvious as seen above that PS flight carrier has the major contribution in the total delay followed by CO carrier.

Also one interesting fact that the out of the three months Month 11 i.e. November has the most number of delays.

UNIQUE CARRIER CODE

Convert Month int (value variable) to Month name (categorical variable)

Since the abbreviated month names is the first three letters of their full names, we could first convert the Month column to datetime and then use dt.month_name() to get the full month name and finally use str.slice() method to get the first three letters, all using pandas and only in one line of code: